NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Modeling and Scheduling of Fusion Patterns in Autonomous Driving Systems

Sobhani, Hoora; Kim, Hyoseung (November 2025, The 33rd International Conference on Real-Time Networks and Systems (RTNS))

In Autonomous Driving Systems (ADS), Directed Acyclic Graphs (DAGs) are widely used to model complex data dependencies and inter-task communication. However, existing DAG scheduling approaches oversimplify data fusion tasks by assuming fixed triggering mechanisms, failing to capture the diverse fusion patterns found in real-world ADS software stacks. In this paper, we propose a systematic framework for analyzing various fusion patterns and their performance implications in ADS. Our framework models three distinct fusion task types: timer-triggered, wait-for-all, and immediate fusion, which comprehensively represent real-world fusion behaviors. Our Integer Linear Programming (ILP)-based approach enables an optimization of multiple real-time performance metrics, including reaction time, time disparity, age of information, and response time, while generating deterministic offline schedules directly applicable to real platforms. Evaluation using real-world ADS case studies, Raspberry Pi implementation, and randomly generated DAGs demonstrates that our framework handles diverse fusion patterns beyond the scope of existing work, and achieves substantial performance improvements in comparable scenarios.
more » « less
Free, publicly-accessible full text available November 5, 2026
Theory-Guided Adaptive Scheduling for ROS 2

Enright, Daniel; Sobhani, Hoora; Kim, Hyoseung (November 2025, The 33rd International Conference on Real-Time Networks and Systems (RTNS))

This paper presents Latency Management Executor (LaME), a theory-guided adaptive scheduling framework that enhances real-time performance in ROS 2 through dynamic resource allocation and hybrid priority-driven scheduling. LaME introduces the concept of threadclasses to dynamically adjust system configurations, ensuring response-time guarantees for real-time chains while maintaining starvation freedom for best-effort chains. By implementing adaptive resource allocation and continuous runtime monitoring, LaME provides robust response times even under fluctuating workloads and resource constraints. We implement our framework for the Autoware reference system and perform our evaluation on an Nvidia Jetson platform. Our results demonstrate that LaME successfully adapts to changing resource availability and workload surges, and effectively balances real-time guarantees with overall system throughput.
more » « less
Free, publicly-accessible full text available November 5, 2026
ECLIP: Energy-efficient and Practical Co-Location of ML Inference on Spatially Partitioned GPUs

Quach, Ryan; Wang, Yidi; Jahanshahi, Ali; Wong, Daniel; Kim, Hyoseung (August 2025, IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED))

As AI inference becomes mainstream, research has begun to focus on improving the energy consumption of inference servers. Inference kernels commonly underutilize a GPU’s compute resources and waste power from idling components. To improve utilization and energy efficiency, multiple models can co-locate and share the GPU. However, typical GPU spatial partitioning techniques often experience significant overheads when reconfiguring spatial partitions, which can waste additional energy through repartitioning overheads or non-optimal partition configurations. In this paper, we present ECLIP, a framework to enable low-overhead energy-efficient kernel-wise resource partitioning between co-located inference kernels. ECLIP minimizes repartitioning overheads by pre-allocating pools of CU masked streams and assigns optimal CU assignments to groups of kernels through our resource allocation optimizer. Overall, ECLIP achieves an average of 13% improvement to throughput and 25% improvement to energy efficiency.
more » « less
Free, publicly-accessible full text available August 6, 2026
BOXR: Body and head motion Optimization framework for eXtended Reality

https://doi.org/10.1109/RTSS62706.2024.00016

Zhang, Ziliang; Li, Zexin; Kim, Hyoseung; Liu, Cong (December 2024, IEEE)

Full Text Available
MII: A Multifaceted Framework for Intermittence-aware Inference and Scheduling

Zhang, Ziliang; Liu, Cong; Kim, Hyoseung (October 2024, 2024 ACM SIGBED International Conference on Embedded Software (EMSOFT))

The concurrent execution of deep neural networks (DNN) inference tasks on intermittently-powered batteryless devices (IPDs) has recently garnered much attention due to its potential in a broad range of smart sensing applications. While the checkpointing mechanisms (CMs) provided by the state-of-the-art make this possible, scheduling inference tasks on IPDs is still a complex problem due to significant performance variations across DNN layers and CM choices. This complexity is further accentuated by dynamic environmental conditions and inherent resource constraints of IPDs. To tackle these challenges, we present MII, a framework designed for intermittence-aware inference and scheduling on IPDs. MII formulates the shutdown and live time functions of an IPD from profiling data, which our offline intermittence-aware search scheme uses to find optimal layer-wise CMs for each task. At runtime, MII enhances job success rates by dynamically making scheduling decisions to mitigate workload losses from power interruptions and adjusting these CMs in response to actual energy patterns. Our evaluation demonstrates the superiority of MII over the state-of-the-art. In controlled environments, MII achieves an average increase of 21% and 39% in successful jobs under stable and dynamic energy patterns. In real-world settings, MII achieves 33% and 24% more successful jobs indoors and outdoors.
more » « less
Full Text Available
OpenSense: An Open-World Sensing Framework for Incremental Learning and Dynamic Sensor Scheduling on Embedded Edge Devices

https://doi.org/10.1109/JIOT.2024.3385016

Bukhari, Abdulrahman; Hosseinimotlagh, Seyedmehdi; Kim, Hyoseung (August 2024, IEEE Internet of Things Journal)

Recent advances in Internet of Things (IoT) technologies have sparked significant interest toward developing learning-based sensing applications on embedded edge devices. These efforts, however, are being challenged by the complexities of adapting to unforeseen conditions in an open-world environment, mainly due to the intensive computational and energy demands exceeding the capabilities of edge devices. In this article, we propose OpenSense, an open-world time-series sensing framework for making inferences from time-series sensor data and achieving incremental learning on an embedded edge device with limited resources. The proposed framework is able to achieve two essential tasks, inference and incremental learning, eliminating the necessity for powerful cloud servers. In addition, to secure enough time for incremental learning and reduce energy consumption, we need to schedule sensing activities without missing any events in the environment. Therefore, we propose two dynamic sensor scheduling techniques: 1) a class-level period assignment scheduler that finds an appropriate sensing period for each inferred class and 2) a Q-learning-based scheduler that dynamically determines the sensing interval for each classification moment by learning the patterns of event classes. With this framework, we discuss the design choices made to ensure satisfactory learning performance and efficient resource usage. Experimental results demonstrate the ability of the system to incrementally adapt to unforeseen conditions and to efficiently schedule to run on a resource-constrained device.
more » « less
Full Text Available
Principled Mining, Forecasting, and Monitoring of Honeybee Time Series with EBV+

https://doi.org/10.1145/3719014

Hossain, Mst Shamima; Faloutsos, Christos; Baer, Boris; Kim, Hyoseung; Tsotras, Vassilis J (June 2025, ACM Transactions on Knowledge Discovery from Data)

Honeybees, as natural crop pollinators, play a significant role in biodiversity and food production for human civilization. Bees actively regulate hive temperature (homeostasis) to maintain a colony’s proper functionality. Deviations from usual thermoregulation behavior due to external stressors (e.g., extreme environmental temperature, parasites, pesticide exposure) indicate an impending colony collapse. Anticipating such threats by forecasting hive temperature and finding changes in temperature patterns would allow beekeepers to take early preventive measures and avoid critical issues. In that case, how can we model bees’ thermoregulation behavior for an interpretable and effective hive monitoring system? In this article, we propose theprincipledElectronic Bee-Veterinarian Plus (EBV+) method based on the thermal diffusion equation and a novel “sigmoid” feedback-loop (P) controller for analyzing hive health with the following properties: (i) it iseffectiveon multiple, real-world beehive time sequences (recorded and streaming), (ii) it isexplainablewith only a few parameters (e.g., hive health factor) that beekeepers can easily quantify and trust, (iii) it issuesproactivealerts to beekeepers before any potential issue affecting homeostasis becomes detrimental, and (iv) it isscalablewith a time complexity of\(O(t)\)for reconstructing and\(O(t\times m)\)for findingmcuts of a sequence withttime-ticks. Experimental results on multiple real-world time sequences showcase the potential and practical feasibility of EBV+. Our method yields accurate forecasting (up to72%improvement in RMSE) with up to600times fewer parameters compared to baselines (ARX, seasonal ARX, Holt-winters, and DeepAR), as well as detects discontinuities and raises alerts that coincide with domain experts’ opinions. Moreover, EBV+ is scalable and fast, taking less than1 minuteon a stock laptop to reconstruct 2 months of sensor data.
more » « less
Free, publicly-accessible full text available June 30, 2026
Exploring Partitioned and Semi-partitioned Callback Scheduling on ROS 2 Multi-threaded Executors

Sobhani, Hoora; Enright, Daniel; Deshpande, Tejas Milind; Kim, Hyoseung (July 2024, ECRTS)

In recent studies aimed at enhancing the analyzability and real-time performance of ROS 2, there has been insufficient emphasis on the importance of different scheduling options, including global, partitioned, and semi-partitioned approaches, particularly when multiple CPU cores are involved. In this work, we enabled the partitioned and semi-partitioned scheduling for ROS 2 multi-threaded executors and discussed the opportunities and the potential issues associated with it.
more » « less
Full Text Available
PAAM: A Framework for Coordinated and Priority-Driven Accelerator Management in ROS 2

https://doi.org/10.1109/RTAS61025.2024.00015

Enright, Daniel; Xiang, Yecheng; Choi, Hyunjong; Kim, Hyoseung (May 2024, IEEE)

This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor that acts as an accelerator resource server, arbitrating accelerator access requests from all other callbacks at the application layer. This approach enables coordinated and priority-driven accelerator access management in multi-process robotic systems. The framework design is directly applicable to all types of accelerators and enables granular control over how specific chains access accelerators, making it possible to achieve predictable real-time support for accelerators used by safety-critical callback chains without making changes to underlying accelerator device drivers. The paper shows that PAAM also offers a theoretical analysis that can upper bound the worst-case response time of safety-critical callback chains that necessitate accelerator access. This paper also demonstrates that complex robotic systems with extensive accelerator usage that are integrated with PAAM may achieve up to a 91% reduction in end-to-end response time of their critical callback chains.
more » « less
Full Text Available
GCAPS: GPU Context-Aware Preemptive Priority-Based Scheduling for Real-Time Tasks

https://doi.org/10.4230/LIPIcs.ECRTS.2024.14

Wang, Yidi; Liu, Cong; Wong, Daniel; Kim, Hyoseung (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pellizzoni, Rodolfo (Ed.)
Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of GPU-level preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose GCAPS, a GPU Context-Aware Preemptive Scheduling approach for real-time GPU tasks. Our approach exerts control over GPU context scheduling at the device driver level and enables preemption of GPU execution based on task priorities by simply adding one-line macros to GPU segment boundaries. In addition, we provide a comprehensive response time analysis of GPU-using tasks for both our proposed approach as well as the default Nvidia GPU driver scheduling that follows a work-conserving round-robin policy. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and response time. The results highlight significant improvements over prior work as well as the default scheduling approach, with up to 40% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.
more » « less
Full Text Available

« Prev Next »

Search for: All records